Goto

Collaborating Authors

 program verification



Learning Loop Invariants for Program Verification

Neural Information Processing Systems

The problem is undecidable and even practical instances are challenging. Inspired by how human experts construct loop invariants, we propose a reasoning framework Code2Inv that constructs the solution by multi-step decision making and querying an external program graph memory block. By training with reinforcement learning, Code2Inv captures rich program features and avoids the need for ground truth solutions as supervision. Compared to previous learning tasks in domains with graph-structured data, it addresses unique challenges, such as a binary objective function and an extremely sparse reward that is given by an automated theorem prover only after the complete loop invariant is proposed. We evaluate Code2Inv on a suite of 133 benchmark problems and compare it to three state-of-the-art systems. It solves 106 problems compared to 73 by a stochastic search-based system, 77 by a heuristic search-based system, and 100 by a decision tree learning-based system. Moreover, the strategy learned can be generalized to new programs: compared to solving new instances from scratch, the pre-trained agent is more sample efficient in finding solutions.


VeriStruct: AI-assisted Automated Verification of Data-Structure Modules in Verus

arXiv.org Artificial Intelligence

We introduce VeriStruct, a novel framework that extends AI-assisted automated verification from single functions to more complex data structure modules in Verus. VeriStruct employs a planner module to orchestrate the systematic generation of abstractions, type invariants, specifications, and proof code. To address the challenge that LLMs often misunderstand Verus' annotation syntax and verification-specific semantics, VeriStruct embeds syntax guidance within prompts and includes a repair stage to automatically correct annotation errors. In an evaluation on eleven Rust data structure modules, VeriStruct succeeds on ten of the eleven, successfully verifying 128 out of 129 functions (99.2%) in total. These results represent an important step toward the goal of automatic AI-assisted formal verification.



Limitations on Safe, Trusted, Artificial General Intelligence

arXiv.org Artificial Intelligence

Safety, trust and Artificial General Intelligence (AGI) are aspirational goals in artificial intelligence (AI) systems, and there are several informal interpretations of these notions. In this paper, we propose strict, mathematical definitions of safety, trust, and AGI, and demonstrate a fundamental incompatibility between them. We define safety of a system as the property that it never makes any false claims, trust as the assumption that the system is safe, and AGI as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- a safe and trusted AI system cannot be an AGI system: for such a safe, trusted system there are task instances which are easily and provably solvable by a human but not by the system. We note that we consider strict mathematical definitions of safety and trust, and it is possible for real-world deployments to instead rely on alternate, practical interpretations of these notions. We show our results for program verification, planning, and graph reachability. Our proofs draw parallels to Gรถdel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of Gรถdel's and Turing's results.


InvBench: Can LLMs Accelerate Program Verification with Invariant Synthesis?

arXiv.org Artificial Intelligence

Program verification relies on loop invariants, yet automatically discovering strong invariants remains a long-standing challenge. We introduce a principled framework for evaluating LLMs on invariant synthesis. Our approach uses a verifier-based decision procedure with a formal soundness guarantee and assesses not only correctness but also the speedup that invariants provide in verification. We evaluate 7 state-of-the-art LLMs, and existing LLM-based verifiers against the traditional solver UAutomizer. While LLM-based verifiers represent a promising direction, they do not yet offer a significant advantage over UAutomizer. Model capability also proves critical, as shown by sharp differences in speedups across models, and our benchmark remains an open challenge for current LLMs. Finally, we show that supervised fine-tuning and Best-of-N sampling can improve performance: fine-tuning on 3589 instances raises the percentage of speedup cases for Qwen3-Coder-480B from 8% to 29.2%, and Best-of-N sampling with N=16 improves Claude-sonnet-4 from 8.8% to 22.1%.


Technical Perspective: A Symbolic Approach to Verifying Quantum Systems

Communications of the ACM

Exceptional added value may lie in connecting two complementary areas of computer science. This statement is particularly true when applying mature techniques developed in one area to solve complex problems that arise in a new area. The accompanying paper, "An Automata-Based Framework for Verification and Bug Hunting in Quantum Circuits" by Lengรกl et al., is a case in point. It applies techniques developed in logic, automata, and symbolic verification to analyze the correctness of quantum programs. The current quest of quantum computing is achieving quantum supremacy--that is, to reach the point where we solve problems that are practically unsolvable using conventional computing.


RAG-Verus: Repository-Level Program Verification with LLMs using Retrieval Augmented Generation

arXiv.org Artificial Intelligence

Scaling automated formal verification to real-world projects requires resolving cross-module dependencies and global contexts, which are challenges overlooked by existing function-centric methods. We introduce RagVerus, a framework that synergizes retrieval-augmented generation with context-aware prompting to automate proof synthesis for multi-module repositories, achieving a 27% relative improvement on our novel RepoVBench benchmark -- the first repository-level dataset for Verus with 383 proof completion tasks. RagVerus triples proof pass rates on existing benchmarks under constrained language model budgets, demonstrating a scalable and sample-efficient verification.


Enhancing Automated Loop Invariant Generation for Complex Programs with Large Language Models

arXiv.org Artificial Intelligence

Automated program verification has always been an important component of building trustworthy software. While the analysis of real-world programs remains a theoretical challenge, the automation of loop invariant analysis has effectively resolved the problem. However, real-world programs that often mix complex data structures and control flows pose challenges to traditional loop invariant generation tools. To enhance the applicability of invariant generation techniques, we proposed ACInv, an Automated Complex program loop Invariant generation tool, which combines static analysis with Large Language Models (LLMs) to generate the proper loop invariants. We utilize static analysis to extract the necessary information for each loop and embed it into prompts for the LLM to generate invariants for each loop. Subsequently, we employ an LLM-based evaluator to assess the generated invariants, refining them by either strengthening, weakening, or rejecting them based on their correctness, ultimately obtaining enhanced invariants. We conducted experiments on ACInv, which showed that ACInv outperformed previous tools on data sets with data structures, and maintained similar performance to the state-of-the-art tool AutoSpec on numerical programs without data structures. For the total data set, ACInv can solve 21% more examples than AutoSpec and can generate reference data structure templates.


Reviews: Learning Loop Invariants for Program Verification

Neural Information Processing Systems

The paper presents a novel deep network architecture termed DELPHI to automatically infer loop invariants for use in program verification. The architecture takes in as input source code which has (1) a number of assumption or assignment statements, (2) a loop with nested if-else statements with arithmetic operations and (3) a final assertion statement. The output of the architecture is a loop invariant in CNF which holds true at every iteration in the loop, and for which the assertion (3) is true after the loop ends execution. The architecture represents the source code AST using a graph-structured neural network, and treats it as a structured memory which it repeatedly accesses through attention operations. The generation of the CNF invariant is broken up into a sequential decision-making process where at each time the architecture predicts an output (op, T), where op is either && or and T is a simple logical expression.